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ABSTRACT 

The oral communication course for English majors at 
the National University of Malaysia includes testing designed by 
faculty and coordinated with the curriculum. This practice ic based 
on the ideas that a te -.her who has been actively involved in 
curriculum design is in a good position to design a test for that 
curriculum, and that teacher-made tests have a beneficial backwash 
effect on student learning. The course features two levels of 
instruction, each taught over two consecutive semesters. Final tests 
for both levels sample global communicative ability. Because the 
approach is communicative , the examinations are series of tests 
administered throughout the semester, allowing for continuous 
feedback to aid instruction. At level 1, the tests focus on three 
speaking tasks: extended, impromptu speech; group discussion; and an 
end-of-semester project. The tasks test three modes of speech: 
talking about oneself, others, experiences; narrating and describing 
events; and expressing and justifying opinions. At level 2, tests 
focus on group discussion, public speaking, debating, and an 
end~of-semester project. Rating scales have been constructed for all 
tests based on the types of communicative ability required. 
Continuous testing has reduced test anxiety. Test development is 
ongoing. (MSE) 
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The Teather As Tester 

There is evidence that not only are teaehers good judges of behaviour, they 
arc also reliable judges of test pcrformanees. (Callaway. D R 1980)^ However . 
would be quite naive and perhaps even imprudent to suggest ih*:". that al 
teaehers will also by extension make naturally good testers given Spolsky s (1975) 
rhetorie on whether testing is art or seienee. Nevertheless, it ean be assumed 
that a teaeher who has been actively involved in course design or better stil m 
the privileged position of 'negotiating' the curriculum, with her students would at 
least have a blueprint of sorts as a starting point for the construction of tests for 
that course. This could be further enhanced if the process is subjected to 
friendly criticism at the very least by other members of staff in relation to he 
obicctives of the course or curriculum as a whole. The teacher is then in the 
informed and educated position of being able to translate the objectives of the 
course into tests construction by linking the specific objectives of the course with 
the task specifications identified. The test would then be underpinned by at least 
a view of language learning even if not a full fledged theory in a cleju ^^^^^^^^^^ 
doing the best that can be done. The analogy is best supplied by Skchan (1988) 
who summarized the current state of the art on (communicative) 'ng_ 

"...Since ... definitive theories do not exist, testers have to do the best they 
can with such theories as arc available." u^a ^,.„r 

The contention therefore is that the teacher who has had some 
responsibility for course design and implementation is in ^ Pre- 
eminently qualified to construct tests for the . irse particularly if it is backed by 
experience and shared knowledge in the field. Since the target group is known at 
to hand, needs can be fairly accurately specified on the basis of •ntrospcc^.on 
and cxpcr encc. The backwash effect of teacher-made tests on teaching can only 
Z SZ AS the teacher in this case is also responsible for course content 
^nd^^ic al other teaehers across the board has the best interests uf her students 
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al heart), she will certainly teach what is to be tested, test what is taught and 
'bias for best' in the use of test procedures and situations. The only possible 
danger lurking in this happy land is the possibility of a teacher who willy-nilly 
teaches the tes; as well and thereby nullaics its value as a measuring instrument. 



ba(:k(;r()1)ni) 

The Target Group 

At the English Department of the National University of Malaysia (UKM), 
students in the second year of the B A in English Studies program are required 
to take both levels 1 and 2 of an oral communication course that straddles two 
semesters or one academic session. These students are viewed as potential 
candidates for the B A in English Studies degree and th(»,re is a tremendous 
responsibility (equally shared by the writing and reading courses) to improve 
their language ability to make them "respectable" (Nair-Venugopal, S, 1988) 
candidates for the program. This may be seen as the perceived and immediate 
need. The projected or future need is seen as a high level of language ability 
that also makes for good language modelling as there is evidence that many of 
these students upon graduation enroll for a diploma in Education and become 
English language teachers. The mature students in the course are invariably 
teachers too. The responsibility is even more awesome given the language 
situation in the country which while overtly ESL also manifests many hybrids of 
the ESL/EFL situation, notwithstanding government efforts at promoting 
English as an important second language. These students (except those who are 
exempted on the basis of a placement test and have earned credits equivalent to 
the course) are also subject to a one year fairly intensive preparatory proficiency 
program (twelve hours per week). The emphasis in this course is on an 
integrated teaching of the four language skills. These students have also had a 
minimum of eleven years of instruction in English as a subject in school. There 
is also invariably the case of the mature student who has probably had 'more' 
English instruction, having been subject chronologically to a different system of 
education in the country's history. 



Course Objectives 



The oral communication course comprises two levels- each level taught 
over two semesters consecutively. The general aim of level i is to provide a 



language learning environment for the acquisition of advanced oral sk'l s a»d 
that of level II to augment and improve upon the skills acquired m level , thi« 
providing a learning continuum for the acquisition of advanced oral ^k-Hs A 
Ihis juncture it must be pointed out that in the integrated program of the first 
year there is an oral fluency component. In other words the students m he 
second year have already been thrown into the 'deep end as it were and the 
assumption is that upon entry to Uvel I they have more than bana or survwa 
skills in oral communication. The reality is that students m sp.te of the first year 
of fairly intensive instruction and exposure enter the second year wah varying 
levels of abilities. The task at hand for the second year oral skills programme is 
quite dear; raise levels of individual oral ability, bridge varying levels of 
individual abilities and yet help students to develop at the.r own pace. Hence the 
need to see the language class as a language acquisition environment bearing in 
mind that contact and exposure with the language outs.de the class is no, 
optimal. The main objective in Level 1 is to achieve a high level o oral fluency 
in the language with an accompanying level of confidence and intelligibility, the 
latter being viewed with some urgency since native vernaculars are increasingly 
used for social communication outside the classroom and Bahasa Malaysia 
remains the language of instruction for courses in all other d-<=-P >"«• JJe 
objective of Level 11 is to achieve a high level of oral language ability. Bo h these 
objectives are further broken down into specific objectives for both levels. The 
tests arc pegged against these objectives. 

The specific objectives of Uvel I of the course are as follows: 

1 attain high levels of intelligibility in speech 

2 comprehend standard varieties of the spoken language without difficulty 

3 interact and converse freely among themselves and other speakers of the 
language 

4 convey information,narrate and describe; express and justify opinions. 

These objectives are realized through an eclectic methodology using a 
variety of instructional devices, classroom procedures and multimedia materials. 

The second objective is realized largely through practice in the language 
laboratorv and it is not tested ie. elicited for as a skill domain the tests that 
hav'S en developed for the course. While it is generally accepted that listen.ng 
comprehension as a skill is not easy to teach, it is even more elusive to test. 
According to Brown, (5. and Yule, G. (1983) 
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"«..a listener's task performance may be unreliable for a number of 
reasons... we have only a very limited understanding of how we could 
determine what il is that listening comprehension entails. Given these 
two observations, it would seem that the assessment of listening 
comprehension is an extremely complex undertaking". 

Having said thai, why ihen has listening comprehension been included as a 
desirable objective on ihe course? As the view of language underlying the 
course is that of communication, no course that purports to leach oral 
communication (which view of language surely sees listening as a reciprocal 
skill) can justifiably not pay attention to leaching il at least. Objective 3 h 
specifically tested as speech interaction in the form of group discussions and 4 as 
eictendcd "impromptu" speech in 3 modes. 1 is rated as a variable of 
performance for both these test types. 4 is also subsumed as 'enabling* skills in 
the group discussion test. 

Objectives for level 2 are as follows: 

1 not only comprehend all standard varieties of the language bul also make 
themselves understood to other speakers of the language without difficulty. 

2 participate in discussions on topics of a wide range of general inlcrcsl 
without hesitation or effort 

3 speak before audiences confidently (as in public speaking/platform 
activities) 

4 convey information, persuade others and express themselves effectively as 
users of the language (as in debates and forums) 

These objectives are achieved through the use of a selection of instructional 
devices, classroom procedures and modes such as simulations, small group 
discussions, debates and public speaking. 

Objective 2 is tested using the group discussion test. 3 and 4 to borrow 
Tarone's notion (1982/83) of a "continuum of inlerlanguage styles" are to be 
seen as examples of "careful styles" and are tested as formal modes of speaking 
and debates. Objective 4 is also elicited as performance variables in the group 
discussion test. The second part of I ie. inlclligibilily/comprc^cnslbility operates 
as an important variable in assessing the performance of all t'lese tests. The 
final tests for both levels sample global communicative abJily m the rehearsed 
speech genre which is an Oral newsmagazine presentation on tape for the first 
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level and a videotaped presentation for the second level of either one of two 
platform activities or a chat show. Both are lake-home, end-of-semester 
projects. 



THE TESTS 



Some Considerations 

"In constructing tests, it is essential to have a defined curriculum or a set 
body of knowledge from which testers determine what to test (Shohamy, E 

1988)". , u , . 

To echo Charles Alderson (1983) the most important question to be asked 
of any test is, "What is it measuring?" which "can be determined by a variety of 
means including face inspection". Needless to say there are two other questions 
that merit equal consideration. One is, how is it measured and perhaps more 
crucially why? With reference to these tests, the question "for whom" ie. the 
target group has already been answered. As for purpose, each test type is seen 
as having a specified purpose that corresponds to an ability in an oral skill 
domain that has been Jelineated in the course objectives. Task specifications are 
prescribed by the oral skills domains. Therefore each test would sample 
different behaviour or skills in the form of different speech modes and the task 
specifications will vary from test type to test type. However all tests will test for 
both linguistic and communicative ability. 

"It is difficult to totally separate the two criteria, as the linguistic quality of 
an utterance can influence comprehensibility the basic communicative 
criterion. Further, while a ma^^r goal of most college or secondary 
language programs is communicative ability in the target language, there 
is justifiable concern with linguistic correctness because ...we are not just 
attempting to teach survival communications..., we are also trying to teach 
literacy in another language". Bartz W H (1979) 



It is quite clear that as the view of the language underlying the teaching is 
communicative and the view of language learning, that of acquisition, 
achievement tests administered both mid-way and at the end of each semester 
will not allow the teacher to obtain feedback on acquired ability which could be 
used for diagnostic purposes as wcU (particularly at entry from the first level to 
the second), nor allow for a 'profiling' of performance. Hence the need for and 
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the development of a continuous 'battery' of tests, spaced out in relation to their 
ordering on the course and as spelt out by the course objectives. These have 
been conceptualized as oral skills domains and rated accordingly. 

"...Advances in the state of the art of achievement testing are directly 
related to advances in the concept of skills domains on which student 
achievement is assessed". Shoemaker (cited by Swain M. 1980) 

The tests are administered at various points in the semesters that roughly 
coincide with points on the ourse where the skills to be tested have already J>ccn 
taught or practised. The course provides ample opportunity in the practice of 
these skills. Such an ordering on the learning continuum had implications for 
the content validity of the tests where, 

"Content validity refers to the ability of a test to measure what has l>ccn 
taught and su'us^quently learned by the students. It is obvious that 
teachers must see that the test is designed so that it contains items that 
ccTclate with the content of instruction. Thus it follows that unless 
students are given practice In oral communication in the foreign language 
classroom, evaluation of commi nication may not be valid...." Bartz (W H 
1979). 

By spacing out the tests in relation to the content, not only is the teacher- 
tester able to Tit* the test to the content, she is also able after each test to obtain 
valuable feedback for the teachiiig of the subsequent domains that have been 
arranged in a cyclical fashion. Hence learning and performance is also on a 
cumulative basis because each skill taught and learnt or acquired presupposes 
and builds on the acquisition and the development of the preceding skills. It is 
on these bases that the tests have been developed and administered over a 
period of time. They are direct tests of performance that are communicative in 
nature and administered on a cumulative basis as part of on-going course 
assessment for both levels. The tests formats, and methods of elicitation owe 
much to some knowledge in the field (particularly the state of the art), test 
feedback, student introspection and teacher retrospection and experience with its 
full range of hunches and intuition. 
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Test Types 



Level I 

Ixvc) I as mcmioncd earlier consists nf three test types. 

1 Extcndcd/'impromptu* speech 

2 (jroup discussion 

3 End-of-semcstcr project 

There arc three speaking tasks of this type. Student speak for about 2 
minutes on the first. 2-3 on the second and 3-5 on the third. The tasks test for 
three modes of speech as follows: 

(i) Talking about oneself, others and experiences 

(ii) Narrating and describing incidents and events 

(iii) Expressing and justifying opinions. 

1 (i) and (ii) arc tested at the beginning of the first level mainly for diagnostic 
purposes as the students arc of heterogeneous levels of proficiency. The 
speeches arc staggered for both (i) and (iii) to ensure that each student has a 
minimum of a minute or so to prepare mentally for the topic. For (ii) they are 
all given an equal amount of time to prepare mentally and to make notes. When 
Ihe testing begins they listen to each other speak, as the audience, thus providing 
the motivation and a Valid* reason as it were for the task, (iii) is tested before 
the second half of the semester, to obtain information on learned behaviour as 
the students have had sufficient practice in expressing and justifying opinions 
through reaching consensus in group work. The topics for (i) and (ii) arc well 
within the students* realm of experience and interest such as 

The happiest day in my life. 

The person who has influenced mc the most. 

However the topics for (iii) are of a slightly controversial nature such as 

Should smoking be banned in all public places? 
Do women make better teachers? 



Both (ii) and (Hi) arc rated for global ability to communicate in the mode 
which is the overall ability of the student to persuade or justify reasons taken for 
a stand in the case of the latter and to describe, report and narrate in the case of 
the former. 

2 The ^roiM^ discussion test is udininistcrcd in ihc second halfoi the semester 
as by this time there has been plenty of practice in the interaction mode as the 
modus operar di of Level 1 is small group work. It tests specifically lor oral 
interaction skills. The topics for group discussion tests arc also based on the 
tacit principle that the content should be either familiar or known and not pose 
problems in the interaction process. Though the amount of communication (size 
of contribution) and substantiveness is rated as criteria, content per se is not 
rated. (J roup discussion in Level 1 tests kwer order interaction skills that are 
discernible at the conversational level. 

The groups discussion test has been modelled on the lines of the Bagrut 
group discussion test with some modifications (see Shomay, E., Reves, T. and 
Bejerano, Y. 1986 and Gefen, R. 1987). In Level 1 the topics are of matters 
tha' either concern or pose a problem to the test lakers as UKM students. 
Hence there is sufficient impetus to talk about them and this 'guarantees' 
initiation by all members of the group in the discussion. Topics in the form of 
statements are distributed just before the tests from a prepared pool of topics. 
Each topic comes with a set of questions. Students are allowed \o read the 
questions in advance but discussion on the topic and questions bcloic the (est is 
not permitted. These questions function as cues to direct and manage the 
interaction. They need not be answered. In fact students may want to speak on 
other aspects of the topic. An example of the topic and questions is as follows: 

Scholarships should be awarded on need and not on merit. 

(a) Are both equally important consideiations? 

(b) Should students have a say in who gets scholarships ie. have student 
representatives on scholarship boards? 

(c) Do generous scholarships make students dependent on aid? 

(d) Are repayable-upon-graduation loans better than scholarships as more 
students can benefit? 

Groups are small and students are divided (depending on class si/e) into 4- 
5 (maximum) students per group. It has been possible to establish a rough ratio 
between rating time per test-taker and their number per group, (iroups of 4 



took 15-20 minutes to round off the discussion and groups of 5 toolc about 20-25 
minutes. However, it is desirable not to cut off the discussion after 20-25 
minutes, as extra time (usually an extra 5 minutes) helped to confirm ratings. 
Rating is immediate on the score sheets prepared for the test (see Appendix C 
ii) A variation of the topics with . . ' -Jum backwash effect on learning is to use 
books that have been recommended for extensive reading as stimulus for group 
discussion. This has been trialled as a class activity. 

It can be seen that the oral interview test is noticeably absent in the 
sampling of speech interactions for Ixvel I of the course and probably begs the 
question why, as it is a common and well established test for testing oral 
interaction. Suffice to say that it is firstly one of the tests administered in the 
first year integrated program (and therefore sampled). Secondly the group 
discussion appears to be a more valid (face and content) test of oral interaction 
in relation to the course objectives. 

3 Since a premium is placed on intelligibility/comprehensibility the end-of- 
semester project tests for overall verbal communicative ability in the rehearsed 
speech genre in the form of a news magazine that is audio taped for a^essment 
and review. The news magazine may be presented either as a collage of items of 
news and views of events and activities on campus or thematically eg. sports on 
campus, cultural activities, student problems etc. 



Level II 

This level consists of 4 test types. 

1 Cjroup discu-ssion 

2 Public speaking 

3 Debates 

4 End-of-semester project 

1 In the second level the group discussion test is administered early in the 
semester and the results used to determine how much more practice is needed in 
improving interaction skills before proceeding to the more formal Performance- 
oriented speech genres. The topics for the group discussion in the second level 
arc of a m^re controversial nature than in the first. Although cognitive load is 
expected to be greater in the tests, procedures for test admmistration and 
scoring arc the same. 
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2 Public speaking is tested mid-way in the second semester after lecture- 
demonstrations and a series of class presentations. As a test of global 
communication skills, both verbal and non-verbal, it represents fairly high level 
order skills on the language learning continuum assumed for the course. Like 
debates, it is a sample of rehearsed speech in a formal situation. It is also viewed 
as a necessary advanced oral skill. Examples of topics are, 

Mothers should not go out to work. 

Alcoholisn- is a worse social evil than drug abuse. 

3 The debate is placed at the end of the semester and usually viewed by the 
students as a finale of sorts of their oral communication skills. As with the 
public speaking lest, topics and teams (for the debates) are made known well in 
advance and students work on the topics cooperatively for the latter. The 
backwash effect on the acquisition of social and study skills is tremendous as 
students are informed that ratings reflect group effort in the debating process. 
Both tests 2 and 3 are rated immediately and video taped for both review and 
record purposes. 

4 The end-of-semester can take two forms — that of a form of a platform 
activity (in the public speaking mode) or a chat show (speech interaction). Both 
test for skills learned or acquired during the course. The platform activity and 
the formal speech situation can be cither an appeal (for blood donation, funds, 
etc) or the promotion of a product/service or idea. The chat show tests for oral 
interaction in the form of an extended interview of a 'celebrity'. Both tests 
simulate real life situations and allow for creativity and flexibility in that students 
can assume personae. 



Criteria and Rating Scales 

"Testers should construct their own rating scales according to the purpose 
ofthetest^ (ShohamyE. 1988) 

Rating scales have been constructed for all the tests developed. A look at 
the criteria and the rating scales (see appendices) for the various tests discussed 
above, shows that the criteria for each test varies although some (mainly 
linguistic) recur as each test samples different types of communicative ability. 

Working over a period of time (ie two years = four semesters) it has been 
possible to specify what criteria should be used to rate each test and therefore 
what sorts of rating scales to produce. It has also been possible to select specific 



239 



ERLC 



descriptors can '"f,^ '^^^^f^^^^^^^^^^ an immediate judgement is no 

one task, ^^'"f ^''"^ descriptive qualitative scales, more 

n,ean task. "--J^- ^^^^^ toSg Han'd in hand with a checklist 
parMmon.ous '^^'"g ";"^^;'„^'„i3 ^hich will vary according to test purp<«e. 
of what arc essentially ho stic criteria w ^ depending on the test, 

the tester rates analytically on a '^eak' 'fair' and 'good' 

These scales are also the absence of 

which P-,vide .guidehnes^ o hdp the ra er o k^ ^^^^^^ 

banded f^^f^^ws ^Ie tester to make relevant remarks of each test on an 

effect on P"formance ve^^^^^^^^^^^^^^^ ^.^^^ 3^,„ , 

The problem (P"^""!, "P™^^^^^ of the ndividual student in that 
that the descriptors may not .^'t he description «' '"^ ^ ^hile 

some of the P-formance variab es^^^^^^^^^^^^ 

others may be P-^"' . ^"^stb ^^^^^^^^^^^^^^^^^ '""^ ^^"'"^ ""''^ °" ' 

holing. However It '^possible tocaieg ^^^^ ^^^^ 

broad basis as 'weak', 'fair' and good ^ iblc to 

analytically on weighted 6 point scales n l^'^ J^^ ^^^^.^ J-^ i3 3^31, 
des^bc them^th — "s ^^ZnX^^ and may even ^ 
^S^i^ro^hTl^^rat r^ 'east they prevent stereo typing of students by 
„:;t aSn"g their performance to prescriptive ready-made bands. 



CONCLUSION 
Test Anxiety 

, «nv!r-tv has been removed from the testing situations in 
because of ;he wider sampling of the speech genres. 
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There is evidence in the literature that Ihc format of a lask can unduly 
affect the performance of some candidates. This makes it necessary to 
include a variety of lest formats for assessing each construct,.. In this 
case, candidates might be given a better chance of demonstrating 
potentially differing abilities (Weir, C. 1989). 

I'raclitioncrs know tlial not only do levels of lest aiixicly vary (lom sUualioii 
to situation and from testee to teslee, it may not even be possible to eliminate 
anxiety as an affective variable. However, in order to further reduce test anxiety 
and to 'bias for bcsl\ students are informed at the beginning of each level about 
course objectives and expectations, test types and task specifications explamed. 
Feedback is also provided after each test although actual scores obtained are not 
divulged. 



Other Matters 



All tests of courses on the university curriculum (cumulative or otherwise) 
are seen as achievement tests with scores and grades awarded accordingly. 
There is a certain amount of tension between rating according to specified 
criteria and the subsequent conversion of the weightage of (he ^ )niponents of 
these criteria into scores. However despite this constraint it is still possible to 
speak of a student's profile of performance in the oral communication class from 
level to level. At the end of the second year similar judgements can be made of 
them as potential students for the B A in English Studies. 

The oral communication course has also been offered more recently as an 
elective to other students and therefore involves more teachers. While the 
difference in clientele does change some of the course's methodological 
perspectives, the objectives have still been maintained as needs arc broadly 
similar. The tests are now being subjected to a process of small-scale teacher 
validation since the question of some extrapolation is apparent. There have been 
informal training and practice sessions for the teachers in the use of the criteria 
and rating scales. Past samples of performance have been reviewed to arrive at 
bench marks and pre-marking sessions held to increase intra and inlcr-ralcr 
reliability. The intersubjectivity and teacher feedback on all these aspects are 
invaluable in improving the efficacy of the test as instruments, at least with 
reference to face and content validity. Obviously more work has to be done 
before anything conclusive can be said 
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